Skip to content

fix(recipes): fix NIM operator validation and demo script issues#483

Open
yuanchen8911 wants to merge 2 commits intoNVIDIA:mainfrom
yuanchen8911:feat/nim-operator-recipe
Open

fix(recipes): fix NIM operator validation and demo script issues#483
yuanchen8911 wants to merge 2 commits intoNVIDIA:mainfrom
yuanchen8911:feat/nim-operator-recipe

Conversation

@yuanchen8911
Copy link
Copy Markdown
Contributor

Summary

Follow-up fixes to #478 (merged) addressing review findings:

  1. Revert health check file loadingApplyRegistryDefaults was loading healthCheck.assertFile content into HealthCheckAsserts, which activated the chainsaw binary path in expected-resources. The deployment validator image (distroless) doesn't include chainsaw, causing runtime failures for all recipes with health checks.

  2. Add expectedResources for NIM operator — Without the chainsaw path, the NIM operator had no deployment validation. Added expectedResources with Deployment/k8s-nim-operator in nvidia-nim namespace so expected-resources verifies the operator is running.

  3. Fix demo script port handlingnim-chat-server.sh now honors API_PORT/UI_PORT env var overrides, fails fast on port conflicts instead of killing unrelated processes, and detects port-forward failures before printing "Ready!".

Test plan

  • go test -race ./pkg/recipe/... passes
  • Tests verify HealthCheckAsserts is NOT populated by ApplyRegistryDefaults
  • expectedResources validated on live EKS cluster with NIM operator deployed

@yuanchen8911 yuanchen8911 requested review from a team as code owners April 2, 2026 17:17
@yuanchen8911 yuanchen8911 added bug Something isn't working area/recipes labels Apr 2, 2026
@github-actions github-actions bot added the size/M label Apr 2, 2026
Add k8s-nim-operator as a new AICR component and create an H100/EKS/Ubuntu
inference recipe for NIM. This supports the CNCF AI Conformance submission
where NIM on EKS is the certified product and AICR is the validation tooling.

- Add `nim` platform type to recipe criteria with tests
- Register k8s-nim-operator v3.1.0 in component registry with health check
- Create h100-eks-ubuntu-inference-nim overlay with DRA support
- Add NIMService workload manifest (Llama 3.2 1B)
- Add NIM chat demo UI (nim-chat-server.sh, nim-chat.html)
- Fix: load healthCheck.assertFile content in ApplyRegistryDefaults so
  deployment validation actually executes Chainsaw health checks

Closes NVIDIA#473
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/recipes bug Something isn't working size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants